{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 入门云原生AI - 1. 从mnist开始体验\n",
    "\n",
    "在这个示例中，我们将演示：\n",
    "\n",
    "* 下载并准备数据\n",
    "* 利用Arena提交单机训练任务,并且查看训练任务状态和日志，并通过TensorBoard查看训练任务\n",
    "* 为您的训练结果部署一个模型预测的在线服务。\n",
    "* 在Notebook中调用您的在线服务，验证模型准确率。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1. 下载TensorFlow样例源代码到 /root/models 目录"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "mkdir: cannot create directory '/root/models': File exists\n",
      "fatal: destination path '/root/models/tensorflow-sample-code' already exists and is not an empty directory.\n"
     ]
    }
   ],
   "source": [
    "! mkdir  -p /root/models\n",
    "! git clone https://code.aliyun.com/xiaozhou/tensorflow-sample-code.git /root/models/tensorflow-sample-code"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. 下载mnist数据到 /root/dataset/mnist"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n",
      "                                 Dload  Upload   Total   Spent    Left  Speed\n",
      "100 1610k    0 1610k    0     0  5225k      0 --:--:-- --:--:-- --:--:-- 5228k\n",
      "  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n",
      "                                 Dload  Upload   Total   Spent    Left  Speed\n",
      "100  4542    0  4542    0     0  22307      0 --:--:-- --:--:-- --:--:-- 22374\n",
      "  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n",
      "                                 Dload  Upload   Total   Spent    Left  Speed\n",
      "100 9680k    0 9680k    0     0  25.2M      0 --:--:-- --:--:-- --:--:-- 25.2M\n",
      "  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n",
      "                                 Dload  Upload   Total   Spent    Left  Speed\n",
      "100 28881    0 28881    0     0   134k      0 --:--:-- --:--:-- --:--:--  135k\n"
     ]
    }
   ],
   "source": [
    "! mkdir -p /root/output\n",
    "! mkdir -p /root/dataset/mnist && \\\n",
    "  cd /root/dataset/mnist && \\\n",
    "  curl -O https://code.aliyun.com/xiaozhou/tensorflow-sample-code/raw/master/data/t10k-images-idx3-ubyte.gz && \\\n",
    "  curl -O https://code.aliyun.com/xiaozhou/tensorflow-sample-code/raw/master/data/t10k-labels-idx1-ubyte.gz && \\\n",
    "  curl -O https://code.aliyun.com/xiaozhou/tensorflow-sample-code/raw/master/data/train-images-idx3-ubyte.gz && \\\n",
    "  curl -O https://code.aliyun.com/xiaozhou/tensorflow-sample-code/raw/master/data/train-labels-idx1-ubyte.gz"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3. 查看目录结构, 其中`dataset`是数据目录，`models`是模型代码目录，`output`是训练结果目录。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/root\n",
      "|-- dataset\n",
      "|   `-- mnist\n",
      "|       |-- t10k-images-idx3-ubyte.gz\n",
      "|       |-- t10k-labels-idx1-ubyte.gz\n",
      "|       |-- train-images-idx3-ubyte.gz\n",
      "|       `-- train-labels-idx1-ubyte.gz\n",
      "|-- models\n",
      "|   `-- tensorflow-sample-code\n",
      "|       |-- README.md\n",
      "|       |-- data\n",
      "|       |-- mnist-tf\n",
      "|       |-- models\n",
      "|       |-- mpijob\n",
      "|       `-- tfjob\n",
      "|-- output\n",
      "|-- public\n",
      "\n",
      "25 directories, 12 files\n"
     ]
    }
   ],
   "source": [
    "! tree -I ai-starter -L 3 /root"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.检查可用GPU资源"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "NAME                                IPADDRESS      ROLE    GPU(Total)  GPU(Allocated)\r\n",
      "cn-shanghai.i-uf60zgmfu7nvxxxxxxxx  192.168.0.46   <none>  1           0\r\n",
      "cn-shanghai.i-uf6hf2g6lwyvxxxxxxxx  192.168.0.203  <none>  0           0\r\n",
      "-----------------------------------------------------------------------------------------\r\n",
      "Allocated/Total GPUs In Cluster:\r\n",
      "0/1 (0%)  \r\n"
     ]
    }
   ],
   "source": [
    "! arena top node"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5.通过Arena提交训练任务\n",
    "\n",
    "5.1 可以根据您的需要设置JOB_NAME，建议在多人共同使用的时候，设置自己独有的JOB_NAME\n",
    "\n",
    "5.2 这里 `$USER_DATA_NAME` 代表您的Notebook 使用的共享存储，存储的根目录和Notebook中`/root`对应。 由集群管理员在部署工作环境时指定[部署数据科学家工作环境（Notebook）](../docs/setup/SETUP_NOTEBOOK.md)创建。 \n",
    "* `$USER_DATA_NAME` 是存放您私有数据的共享存储，文件内容对应Notebook的 `/root` 目录\n",
    "* `$PUBLIC_DATA_NAME` 是存放公共数据的共享存储，文件内容对应Notebook的 `/root/public` 目录。在arena的命令中，如果需要使用公共存储里的文件，可以指定参数 `--data=$PUBLIC_DATA_NAME:/public`，并在训练代码中指定容器使用`/public`目录里的代码或数据\n",
    "\n",
    "在下面的提交命令中：\n",
    "* `--data=$USER_DATA_NAME:/training` 表示将共享存储映射到训练任务的`/training`目录。\n",
    "* `/training`目录下的子目录`/training/models/tensorflow-sample-code` 就是步骤1拷贝源代码的位置\n",
    "* `/training`目录下的子目录`/training/dataset/mnist`就是步骤2下载数据的位置\n",
    "* `/training`目录下的子目录`/training/output`就是步骤3创建的训练结果输出的位置。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "env: JOB_NAME=tf-mnist\n",
      "configmap/tf-mnist-tfjob created\n",
      "configmap/tf-mnist-tfjob labeled\n",
      "service/tf-mnist-tensorboard created\n",
      "deployment.extensions/tf-mnist-tensorboard created\n",
      "tfjob.kubeflow.org/tf-mnist created\n",
      "\u001b[36mINFO\u001b[0m[0008] The Job tf-mnist has been submitted successfully \n",
      "\u001b[36mINFO\u001b[0m[0008] You can run `arena get tf-mnist --type tfjob` to check the job status \n"
     ]
    }
   ],
   "source": [
    "# Set the Job Name\n",
    "%env JOB_NAME=tf-mnist\n",
    "%env USER_DATA_NAME=training-data\n",
    "# Submit a training job \n",
    "# using code and data from Data Volume\n",
    "!arena submit tf \\\n",
    "             --name=$JOB_NAME \\\n",
    "             --gpus=1 \\\n",
    "             --data=$USER_DATA_NAME:/training \\\n",
    "             --tensorboard \\\n",
    "             --image=tensorflow/tensorflow:1.11.0-gpu-py3 \\\n",
    "             --logdir=/training/output/mnist \\\n",
    "             \"python /training/models/tensorflow-sample-code/tfjob/docker/mnist/main.py --max_steps 5000 --data_dir /training/dataset/mnist --log_dir /training/output/mnist\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> - `Arena`命令的`--logdir`指定`tensorboard`从训练任务的指定目录读取event  \n",
    "> - 完整参数可以参考[命令行文档](https://github.com/kubeflow/arena/blob/master/docs/cli/arena_submit_tfjob.md)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 6.检查模型训练状态\n",
    "当任务状态从`Pending`转为`Running`后就可以查看日志和GPU使用率了。这里`-e`为了方便检查任务`Pending`的原因。通常看到`[Pulling] pulling image \"tensorflow/tensorflow:1.11.0-gpu-py3\"`代表容器镜像过大，导致任务处于`Pending`。这时可以重复执行下列命令直到任务状态变为`Running`。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "STATUS: RUNNING\n",
      "NAMESPACE: default\n",
      "TRAINING DURATION: 1s\n",
      "\n",
      "NAME      STATUS   TRAINER  AGE  INSTANCE          NODE\n",
      "tf-mnist  RUNNING  TFJOB    1s   tf-mnist-chief-0  192.168.0.31\n",
      "\n",
      "Your tensorboard will be available on:\n",
      "http://192.168.0.203:32116   \n",
      "\n",
      "Events: \n",
      "No events for pending pod\n"
     ]
    }
   ],
   "source": [
    "! arena get $JOB_NAME -e"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 7.实时检查日志\n",
    "此时可以通过调整`--tail=`的数值展示输出的行数。默认为显示全部日志。\n",
    "想要获取实时日志， 可以执行`-f`参数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2019-04-03T11:30:20.434833662Z WARNING:tensorflow:From /training/models/tensorflow-sample-code/tfjob/docker/mnist/main.py:41: read_data_sets (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.\r\n",
      "2019-04-03T11:30:20.434891738Z Instructions for updating:\r\n",
      "2019-04-03T11:30:20.434898475Z Please use alternatives such as official/mnist/dataset.py from tensorflow/models.\r\n",
      "2019-04-03T11:30:20.473209783Z WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:260: maybe_download (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.\r\n",
      "2019-04-03T11:30:20.473235115Z Instructions for updating:\r\n",
      "2019-04-03T11:30:20.473239615Z Please write your own downloading logic.\r\n",
      "2019-04-03T11:30:20.483963934Z WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:262: extract_images (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.\r\n",
      "2019-04-03T11:30:20.483984281Z Instructions for updating:\r\n",
      "2019-04-03T11:30:20.483988225Z Please use tf.data to implement this functionality.\r\n",
      "2019-04-03T11:30:20.778228754Z WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:267: extract_labels (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.\r\n",
      "2019-04-03T11:30:20.778261288Z Instructions for updating:\r\n",
      "2019-04-03T11:30:20.778264876Z Please use tf.data to implement this functionality.\r\n",
      "2019-04-03T11:30:20.786090015Z WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:110: dense_to_one_hot (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.\r\n",
      "2019-04-03T11:30:20.78610894Z Instructions for updating:\r\n",
      "2019-04-03T11:30:20.786113407Z Please use tf.one_hot on tensors.\r\n"
     ]
    }
   ],
   "source": [
    "! arena logs --tail=50 $JOB_NAME"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 8. 查看实时训练的GPU使用情况"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INSTANCE NAME     GPU(Device Index)  GPU(Duty Cycle)  GPU(Memory MiB)  STATUS     NODE\r\n",
      "tf-mnist-chief-0  N/A                N/A              N/A              Succeeded  \r\n"
     ]
    }
   ],
   "source": [
    "! arena top job $JOB_NAME"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 9. 查看训练结果\n",
    "#### 9.1 通过TensorBoard查看训练趋势\n",
    "您可以使用 `192.168.1.117:30670` 访问 Tensorboard。如果您通过笔记本电脑无法直接访问 Tensorboard，可以考虑在您的笔记本电脑使用 `sshuttle`。例如：`sshuttle -r root@41.82.59.51 192.168.0.0/16`。其中`41.82.59.51`为集群内某个节点的外网IP，且该外网IP可以通过ssh访问。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "STATUS: SUCCEEDED\n",
      "NAMESPACE: default\n",
      "TRAINING DURATION: 13m\n",
      "\n",
      "NAME      STATUS     TRAINER  AGE  INSTANCE          NODE\n",
      "tf-mnist  SUCCEEDED  TFJOB    20m  tf-mnist-chief-0  N/A\n",
      "\n",
      "Your tensorboard will be available on:\n",
      "http://192.168.0.203:30532   \n"
     ]
    }
   ],
   "source": [
    "# show job detail\n",
    "! arena get $JOB_NAME"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![](1-1-tensorboard.jpg)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 9.2. 查看模型训练产生的结果文件, 在`/root/output`下生成了训练结果.\n",
    "\n",
    "`/root/output/mnist` 目录中是训练过程中产生的checkpoint文件，代表训练结束时模型的变量状态。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/root/output\r\n",
      "|-- mnist\r\n",
      "|   |-- checkpoint\r\n",
      "|   |-- model.ckpt-4500.data-00000-of-00001\r\n",
      "|   |-- model.ckpt-4500.index\r\n",
      "|   |-- model.ckpt-4500.meta\r\n",
      "|   |-- model.ckpt-4600.data-00000-of-00001\r\n",
      "|   |-- model.ckpt-4600.index\r\n",
      "|   |-- model.ckpt-4600.meta\r\n",
      "|   |-- model.ckpt-4700.data-00000-of-00001\r\n",
      "|   |-- model.ckpt-4700.index\r\n",
      "|   |-- model.ckpt-4700.meta\r\n",
      "|   |-- model.ckpt-4800.data-00000-of-00001\r\n",
      "|   |-- model.ckpt-4800.index\r\n",
      "|   |-- model.ckpt-4800.meta\r\n",
      "|   |-- model.ckpt-4900.data-00000-of-00001\r\n",
      "|   |-- model.ckpt-4900.index\r\n",
      "|   |-- model.ckpt-4900.meta\r\n",
      "|   |-- test\r\n",
      "|   |   |-- events.out.tfevents.1554286572.tf-mnist-chief-0\r\n",
      "|   |   `-- events.out.tfevents.1554291023.tf-mnist-chief-0\r\n",
      "|   `-- train\r\n",
      "|       |-- events.out.tfevents.1554286571.tf-mnist-chief-0\r\n",
      "|       `-- events.out.tfevents.1554291021.tf-mnist-chief-0\r\n",
      "`-- mnist-model\r\n",
      "    `-- 1\r\n",
      "        |-- saved_model.pb\r\n",
      "        `-- variables\r\n",
      "\r\n",
      "6 directories, 21 files\r\n"
     ]
    }
   ],
   "source": [
    "! tree -I ai-starter -L 3 /root/output"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 10. 将训练过程中产生的checkpoint转换为模型文件\n",
    "checkpoint文件不能直接用于部署预测服务，需要进行一次模型导出，将checkpoint文件中的模型状态导出为可以被预测服务识别的模型文件。 我们可以通过arena 提交一个转换任务，声明将`output/mnist`目录下的checkpoint文件导出到`output/mnist-model`目录。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "configmap/export-model-tfjob created\n",
      "configmap/export-model-tfjob labeled\n",
      "tfjob.kubeflow.org/export-model created\n",
      "\u001b[36mINFO\u001b[0m[0008] The Job export-model has been submitted successfully \n",
      "\u001b[36mINFO\u001b[0m[0008] You can run `arena get export-model --type tfjob` to check the job status \n"
     ]
    }
   ],
   "source": [
    "! arena submit tf \\\n",
    "     --name=export-model \\\n",
    "     --workers=1 \\\n",
    "     --gpus=1 \\\n",
    "     --data=$USER_DATA_NAME:/training \\\n",
    "     --image=tensorflow/tensorflow:1.11.0-gpu-py3 \\\n",
    "     \"python /training/models/tensorflow-sample-code/tfjob/docker/mnist/export_model.py \\\n",
    "      --checkpoint_step=4900 \\\n",
    "     --checkpoint_path=/training/output/mnist /training/output/mnist-model/ \""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 10.1.查看模型导出任务的执行状态"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "STATUS: FAILED\r\n",
      "NAMESPACE: default\r\n",
      "TRAINING DURATION: 6s\r\n",
      "\r\n",
      "NAME          STATUS  TRAINER  AGE  INSTANCE              NODE\r\n",
      "export-model  FAILED  TFJOB    1m   export-model-chief-0  N/A\r\n"
     ]
    }
   ],
   "source": [
    "! arena get export-model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 10.2 导出任务执行完毕后，可以在`output/mnist-model`目录中看到导出的模型文件。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/root/output/mnist-model\r\n",
      "`-- 1\r\n",
      "    |-- saved_model.pb\r\n",
      "    `-- variables\r\n",
      "        |-- variables.data-00000-of-00001\r\n",
      "        `-- variables.index\r\n",
      "\r\n",
      "2 directories, 3 files\r\n"
     ]
    }
   ],
   "source": [
    "! tree -I ai-starter -L 3 /root/output/mnist-model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 11. 部署预测服务\n",
    "\n",
    "我们得到可以用于部署预测服务的模型文件后，可以通过`arena serve` 提交一个模型预测的在线任务。 \n",
    "\n",
    "* `--data=$USER_DATA_NAME:/training` 表示将共享存储目录挂载到预测服务的`/training`目录\n",
    "* `--modelPath=/training/output/mnist-model` 表示预测服务使用的模型文件目录，就是我们在步骤12中导出的模型文件位置"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "configmap/mnist-tf-serving created\n",
      "configmap/mnist-tf-serving labeled\n",
      "configmap/mnist-tensorflow-serving-cm created\n",
      "service/mnist-tensorflow-serving created\n",
      "deployment.extensions/mnist-tensorflow-serving created\n"
     ]
    }
   ],
   "source": [
    "! arena serve tensorflow \\\n",
    "    --servingName=mnist \\\n",
    "    --modelName=mnist \\\n",
    "    --image=tensorflow/serving:latest  \\\n",
    "    --data=$USER_DATA_NAME:/training \\\n",
    "    --modelPath=/training/output/mnist-model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 11.2查看预测服务\n",
    "\n",
    "我们可以查到到预测服务的部署状态，以及入口访问地址。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "NAME   TYPE        VERSION  STATUS   CLUSTER-IP\r\n",
      "mnist  Tensorflow           RUNNING  172.19.82.216\r\n"
     ]
    }
   ],
   "source": [
    "! arena serve list"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 12. 通过预测服务验证模型准确率"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requirement already satisfied: requests in /usr/local/lib/python3.5/dist-packages (2.21.0)\n",
      "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.5/dist-packages (from requests) (2019.3.9)\n",
      "Requirement already satisfied: idna<2.9,>=2.5 in /usr/local/lib/python3.5/dist-packages (from requests) (2.8)\n",
      "Requirement already satisfied: urllib3<1.25,>=1.21.1 in /usr/local/lib/python3.5/dist-packages (from requests) (1.24.1)\n",
      "Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /usr/local/lib/python3.5/dist-packages (from requests) (3.0.4)\n",
      "\u001b[33mYou are using pip version 18.1, however version 19.0.3 is available.\n",
      "You should consider upgrading via the 'pip install --upgrade pip' command.\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "# 安装必要的python 库\n",
    "! pip install requests"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 12.1 定义函数，函数内容通过Http调用预测服务"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "以下代码定义一个python方法`pick_image_and_predict`，这个方法会从mnist的测试数据集中随机选择一张图片，作为请求的参数，通过http方式调用预测服务，得到模型预测的记过。这个方法执行后会同时打印图片的真实值，和通过预测服务推理得到的值。 您可以用于判断模型的准确率是否满足要求。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 92,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Extracting /root/dataset/mnist/train-images-idx3-ubyte.gz\n",
      "Extracting /root/dataset/mnist/train-labels-idx1-ubyte.gz\n",
      "Extracting /root/dataset/mnist/t10k-images-idx3-ubyte.gz\n",
      "Extracting /root/dataset/mnist/t10k-labels-idx1-ubyte.gz\n"
     ]
    }
   ],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "import random\n",
    "import requests\n",
    "import json\n",
    "from tensorflow.examples.tutorials.mnist import input_data\n",
    "%matplotlib inline\n",
    "data_dir=\"/root/dataset/mnist/\"\n",
    "mnist = input_data.read_data_sets(data_dir, one_hot=True)\n",
    "test_images = mnist.test.images\n",
    "test_labels = mnist.test.labels\n",
    "digits = ['0','1','2','3','4','5','6','7','8','9']\n",
    "# -*- coding: utf-8 -*-\n",
    "\n",
    "def show(idx, title):\n",
    "  plt.figure()\n",
    "  plt.imshow(test_images[idx].reshape(28,28))\n",
    "  plt.axis('off')\n",
    "  plt.title('\\n\\n{}'.format(title), fontdict={'size': 16})\n",
    "\n",
    "def predict(url, num):\n",
    "    test_cls = np.argmax(test_labels, axis=1)\n",
    "    show(num, 'The Picture is {}'.format(test_cls[num]))\n",
    "    headers = {\"content-type\": \"application/json\"}\n",
    "    metadata=requests.get(url + '/metadata')\n",
    "    data = json.dumps({\"signature_name\": \"predict_images\", \"dropout/Placeholder\": 1.0,\"inputs\": test_images[num].reshape(1, 784).tolist()})\n",
    "    json_response = requests.post(model_api+':predict', data=data, headers=headers)\n",
    "    scores = json.loads(json_response.text)['outputs']\n",
    "    predicted_digits_idx = np.argmax(scores)\n",
    "    print('预测识别的数字: {}'.format(digits[predicted_digits_idx]))\n",
    "    return scores\n",
    "\n",
    "def pick_image_and_predict(model_api):\n",
    "    random_image = random.randint(0,len(test_images)-1)\n",
    "    score = predict(model_api, random_image)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 12.2 调用预测服务\n",
    "我们可以得到真实值和预测值。 您可以多执行几次或者修改代码增加执行次数，比较预测结果和真是结果，判断模型的预测准确率。\n",
    "\n",
    "这里`mnist-tensorflow-serving` 代表预测服务的服务域名，您也可以改为IP地址，IP地址的获取在12步中，通过`arena serve list` 可以得到的预测服务的服务IP。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model_api='http://mnist-tensorflow-serving:8501/v1/models/mnist'\n",
    "pick_image_and_predict(model_api)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###  13. 删除已经完成的任务"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "service \"tf-mnist-tensorboard\" deleted\n",
      "deployment.extensions \"tf-mnist-tensorboard\" deleted\n",
      "tfjob.kubeflow.org \"tf-mnist\" deleted\n",
      "configmap \"tf-mnist-tfjob\" deleted\n",
      "\u001b[36mINFO\u001b[0m[0006] The Job tf-mnist has been deleted successfully \n",
      "tfjob.kubeflow.org \"export-model\" deleted\n",
      "configmap \"export-model-tfjob\" deleted\n",
      "\u001b[36mINFO\u001b[0m[0006] The Job export-model has been deleted successfully \n",
      "configmap \"mnist-tensorflow-serving-cm\" deleted\n",
      "service \"mnist-tensorflow-serving\" deleted\n",
      "deployment.extensions \"mnist-tensorflow-serving\" deleted\n",
      "configmap \"mnist-tf-serving\" deleted\n",
      "\u001b[36mINFO\u001b[0m[0002] The Serving job mnist has been deleted successfully \n"
     ]
    }
   ],
   "source": [
    "# delete job\n",
    "! arena delete $JOB_NAME\n",
    "! arena delete export-model\n",
    "# delete serving job\n",
    "! arena serve delete mnist"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "恭喜！您已经使用 `arena` 成功运行了训练作业，而且还能轻松检查 Tensorboard。\n",
    "\n",
    "总结，希望您通过本次演示了解：\n",
    "1. 如何准备代码和数据，并将其放入数据卷中\n",
    "2. 如何在训练任务中引用数据卷，并且使用其中的代码和数据\n",
    "3. 如何利用arena管理您的训练任务。\n",
    "4. 为您的训练结果部署一个模型预测的在线服务。\n",
    "5. 在Notebook中调用您的在线服务，验证模型准确率。\n",
    "\n",
    "以上是使用`Arena`在云上进行模型训练的例子，您可以通过修改代码`/root/models/tensorflow-sample-code/tfjob/docker/mnist/main.py`重新提交，实现迭代的模型开发目的。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
